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Fundamental limits of remote estimation of autoregressive 
Markov processes under communication constraints 

Jhelum Chakravorty and Aditya Mahajan 


Abstract —The fundamental limits of remote estimation of autoregres¬ 
sive Markov processes under communication constraints are presented. 
The remote estimation system consists of a sensor and an estimator. The 
sensor observes a discrete-time antoregressive Markov process driven 
by a symmetric and nnimodal innovations process. At each time, the 
sensor either transmits the cnrrent state of the Markov process or does 
not transmit at all. The estimator estimates the Markov process based 
on the transmitted observations. In snch a system, there is a trade-off 
between communication cost and estimation accuracy. Two fundamental 
limits of this trade-off are characterized for infinite horizon disconnted 
cost and average cost setups. First, when each transmission is costly, 
we characterize the minimum achievable cost of communication plus 
estimation error. Second, when there is a constraint on the average 
number of transmissions, we characterize the minimum achievable 
estimation error. Transmission and estimation strategies that achieve 
these fundamental limits are also identified. 

Index Terms —Constrained Markov decision processes, event-based 
communication, real-time communication, remote estimation, renewal 
theory, threshold strategies 


I. Introduction 

A. Motivation and literature overview 

In many applications such as networked control systems, sensor 
and surveillance networks, and transportation networks, etc., data 
must be transmitted sequentially from one node to another under 
a strict delay deadline. In many of such real-time communication 
systems, the transmitter is a battery powered device that transmits 
over a wireless packet-switched network; the cost of switching on 
the radio and transmitting a packet is significantly more important 
than the size of the data packet. Therefore, the transmitter does not 
transmit all the time; but when it does transmit, the transmitted packet 
is as big as needed to communicate the current source realization. 
In this paper, we characterize fundamental trade-offs between the 
estimation error (or distortion) and the cost or average number of 
transmissions in such systems. 

In particular, we consider a sensor that observes a first-order 
autoregressive Markov process. At each time instant, based on the 
current state of the process and the history of its past decisions, the 
sensor determines whether or not to transmit the current state. If the 
sensor does not transmit, the receiver must estimate the state using the 
previously transmitted values. A per-step distortion function measures 
the estimation error. We investigate two fundamental trade-offs in this 
setup: (i) when there is a cost associated with each communication, 
what is the minimum expected estimation error plus communication 
cost; and (ii) when there is a constraint on the average number of 
transmissions, what is the minimum estimation error. For both these 
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cases, we characterize the transmission and estimation strategies that 
achieve the optimal trade-off. 

Two approaches have been used in the literature to investigate real¬ 
time or zero-delay communication. The first approach considers cod¬ 
ing of individual sequences (^-(^; the second approach considers 
coding of Markov sources |5|-|10|. The model presented above fits 
with the latter approach. In particular, it may be viewed as real-time 
transmission, which is noiseless but expensive. In most of the results 
in the literature, the focus has been on identifying sufficient statistics 
(or information states) at the transmitter and the receiver; for some 
of the models, a dynamic programming decomposition has also been 
derived. However, very little is known about the solution of these 
dynamic programs. 

The communication system described above is much simpler than 
the general real-time communication setup due to the following 
feature: whenever the transmitter transmits, it sends the current state 
to the receiver. These transmitted events reset the estimation error to 
zero. We exploit these special features to identify an analytic solution 
to the dynamic program corresponding to the above communication 
system. 

A static (one shot) remote estimation problem was first considered 
in 0 in the context of information gathering in organizations. 
The problem of optimal off line choice of measurement times was 
considered in whereas the problem of optimal online choice 
of measurement times was considered in The closely related 
problem of event-based sampling (also called Lebesgue sampling) 
was considered in GD In addition, several variations of the remote 
estimation problem have been considered in the literature. The most 
closely related models are G) GD -|18|, ]20| , which are summarized 
below. Other related work includes censoring sensors (0. (0 
(where a sensor takes a measurement and decides whether to transmit 
it or not; in the context of sequential hypothesis testing), estimation 
with measurement cost |23|-|25| (where the receiver decides when 
the sensor should transmit), sensor sleep scheduling (26|-|29| (where 
the sensor is allowed to sleep for a pre-specified amount of time); and 
event-based communication |3Q|-|32| (where the sensor transmits 
when a certain event takes place). We contrast our model with |T|, 
(18), (^ below. 

In |15| , optimal remote estimation of i.i.d. Gaussian processes is 
investigated under a constraint on the total number of transmissions. 
The optimal estimation strategy is derived when the transmitter is 
restricted to be of threshold-type. 

In | |16| , the optimal remote estimation of a continuous-time autore¬ 
gressive Markov process driven by Brownian motion is considered 
under a constraint on the number of transmissions. The optimal 
transmission strategy is derived under an assumption on the structure 
of the optimal estimation strategy. It is shown that the optimal 
transmission strategy is of a threshold-type, where the thresholds 
are determined by solving a sequence of nested optimal stopping 
problems. 

In |17| optimal remote estimation of Gauss-Markov processes is 
investigated when there is a cost associated with each transmission. 
The optimal transmission strategy is derived when the estimation 
strategy is restricted to be Kalman-like. 
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In QJ, |18| , |20| , optimal remote estimation of autoregressive 
Markov processes is investigated when there is a cost associated with 
each transmission. It is assumed that the autoregressive process is 
driven by a symmetric and unimodal noise process but no assumption 
is imposed on the structure of the transmitter or the receiver. Using 
different solution approaches ( |Tj, |18| use majorization theory while 
|20| uses person-by-person optimality), it is shown that the optimal 
transmission strategy is threshold-based and the optimal estimation 
strategy is Kalman-like (the precise form of these strategies is 
stated in Theorem [^. Thus, the optimal transmission and estimation 
strategies are easy to implement. 

An immediate question is how to identify the optimal transmission 
and estimation strategies for a given communication cost. It is shown 
in 0 , (18), @ that the optimal estimation strategy does not depend 
on the communication cost while the optimal transmission strategy 
can be computed by solving an appropriate dynamic program. How¬ 
ever, the dynamic programs presented in |18| , |20| do not exploit 
the threshold structure of the optimal strategy. 

In this paper, we provide an alternative approach to identify the 
optimal transmission strategies. We consider infinite horizon remote 
estimation problem and show that there is no loss of optimality 
in restricting attention to transmission strategies that use a time 
homogeneous threshold. To determine the optimal threshold, we first 
provide computable expressions for the performance of a generic 
threshold-based transmission strategy and then use these expressions 
to identify the best threshold-based strategy. Thus, we show that the 
structure of optimal strategies derived in 0 , |18| , |20| is also useful 
to compute the optimal strategy. 

B. Contributions 

We investigate remote estimation for two models of Markov 
processes—discrete state autoregressive Markov processes (Model A) 
and continuous state autoregressive Markov processes (Model B); 
both driven by symmetric and unimodal innovations process—under 
two infinite horizon setups; the discounted setup with discount factor 
P £ (0,1) and the long term average setup, which we denote by 
P = 1 for uniformity of notation. For both models, we consider two 
fundamental trade-offs: 

1) Costly communication: When each transmissions costs A units, 
what is the minimum achievable cost of communication plus 
estimation error, which we denote by C^(A). 

2) Constrained communication: When the average number of 
transmissions are constrained by a £ (0,1), what is the 
minimum achievable estimation error, which we denote by 
Dp{a) and refer to as the distortion-transmission trade-off. 

We completely characterize both trade-offs. In particular, 

• In Model A, C'^(A) is continuous, increasing, piecewise-linear, 
and concave in A while D*p{a) is continuous, decreasing, 
piecewise-linear, and convex in a. We derive explicit expressions 
(in terms of simple matrix products) for the corner points of both 
these curves. 

• In Model B, C^(A) is continuous, increasing, and concave in A 
while D*p{a) is continuous, decreasing, and convex in a. We 
derive an algorithmic procedure to compute these curves by us¬ 
ing solutions of Fredholm integral equations of the second kind. 
When the innovations process is Gaussian, we characterize how 
these curves scale as a function of the variance . 

We also explicitly identify transmission and estimation strategies 
that achieve any point on these trade-off curves. For all cases, we 
show that: (i) there is no loss of optimality in restricting attention to 
time-homogeneous strategies; (ii) the optimal estimation strategy is 
Kalman-like; (iii) the optimal transmission strategy is a randomized 



Fig. 1: Block diagram of a remote estimation system. 

threshold-based strategy for Model A and is a deterministic threshold- 
based strategy for Model B. 

In addition, 

• In Model A, the optimal threshold as a function of A or a can 
be computed using a look-up table. 

• In Model B, the optimal threshold as function of A or a can be 
computed using the solutions of Fredholm integral equations of 
the second kind. 

C. Notation 

We use the following notation. Z, Z>o and Z>o denote the set of 
integers, the set of non-negative integers and the set of strictly positive 
integers, respectively. Similarly, R, R>o and R>o denote the set of 
reals, the set of non-negative reals and the set of strictly positive reals, 
respectively. Upper-case letters (e.g., X, Y) denote random variables; 
corresponding lower-case letters (e.g. x, y) denote their realizations. 
Xi:t is a short hand notation for the vector (Xi,... ,Xt). Given a 
matrix A, Aij denotes its (i, j)-th element, Ai denotes its i-th row, 
A'^ denotes its transpose. We index the matrices by sets of the form 
{—k,..., k}; so the indices take both positive and negative values. 
For k £ Z>o, Ik denotes the identity matrix of dimension k x k, 
and Ik denotes fc x 1 vector of ones. 

{v,w} denotes the inner product between vectors v and w, P(-) 
denotes the probability of an event, E[-] denotes the expectation 
of a random variable, and !{■} denotes the indicator function of a 
statement. We follow the convention of calling a sequence {afc}^o 
increasing when oi < a 2 < • • •. If all the inequalities are strict, then 
we call the sequence strictly increasing. 

II. Model and problem formulation 

A. Model 

Consider the following two models of a discrete-time Markov 
process {Xt}^o with the initial state Xq = 0 and for t > 0, 

Xt+i=aXt + Wt, (1) 

where {Wt}“o is an i.i.d. innovations process. We consider two 
specific models: 

• Model A: a,Xt,Wt £ Z and Wt is distributed according to 
a unimodal and symmetric pmf (probability mass function) p, 
i.e., for all e £ Z>o, Pe = P-e and p^ > Pe+i- To avoid trivial 
cases, we assume po is strictly less than 1. 

• Model B: a,Xt, Wt £ R and Wt is distributed according to a 
unimodal, differentiable and symmetric pdf (probability density 
function) p, i.e., for all e £ R>o, Pie) = Pi—e) and for any 
5 £ R>o, Pie) > Pie + S). 

For uniformity of notation, define X to be equal to Z for Model A 
and equal to R for Model B. X>o and X>o are defined similarly. 

A sensor sequentially observes the process and at each time, 
chooses whether or not to transmit the current state. This decision is 
denoted by Ut £ {0,1}, where Ut = 0 denotes no transmission and 
Ui = 1 denotes transmission. The decision to transmit is made using 
a transmission strategy f = where 

Ut = ftiXo:t,Uo:t-l). 


( 2 ) 
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We use the short-hand notation Xo:t to denote the sequence 
(Xo,... ,Xt). Similar interpretations hold for Uo-.t-i- 

The transmitted symbol, which is denoted by Yt, is given by 


Yt = 



if Ut = 1; 
if Ut = 0. 


where Yt = <£ denotes no transmission. 

The receiver sequentially observes {Yt}’^o and generates an esti¬ 
mate X £ X, using an estimation strategy g = {gt}t^o, 

i.e., 

Xt=gtiYo:t). (3) 


The fidelity of the estimation is measured by a per-step distortion 
d(Xt-Xt). 

For both models, we assume the following: 

• d(0) = 0 and for e 7^ 0, d(e) > 0; 

• d(-) is even, i.e., for all e, d{e) = d(—e); 

• d(-) is increasing, i.e., for ei > 62 > 0, d(ei) > d(e2); 

• For Model B, we assume that d(-) is differentiable. 

We also characterize our results to the following special case of 
Model B: 

• Gauss-Markov model: the density tj) is zero-mean Gaussian 
with variance and the distortion is quadratic, i.e., 

4>{e) = ^ exp ( — e^/(2f7^)) and d(e) = e^. 


B. Performance measures 

Given a transmission and estimation strategy (/, g) and a discount 
factor 0 G (0,1], we define the expected distortion and the expected 
number of transmissions as follows. For /3 G (0,1), the expected 
discounted distortion is given by 

00 

Dp if, g) ■.= {!- ^ 0^d{Xt - XO I Xo = 0 ] (4) 

and for /3 = 1, the expected long-term average distortion is given by 


Di{f,g) ■- limsup-E 

T —Foo -t 


(f,9) 


1 — L 

[ y] d(Xt - Xt) I Xo = o]. 


(5) 


Similarly, for P G (0,1), the expected discounted number of 
transmissions is given by 

OO 

Np{f,g) ■■= (1 - I Xo = o] (6) 

^ t=0 


and for /3 = 1, the expected long-term average number of transmis¬ 
sions is given by 


1 ^ I 

Xi(/, g) := limsup 17J Xo = ol. 

T-FOO L I J 


(V) 


Remark 1 We use a normalizing factor of (1 — /3) to have a 
unified scaling for both discounted and long-term average setups. 
In particular, we will show that for any strategy (/, g) 


Ci{f,g-,\) =limCpif,g-,X), and Di{f,g)=limDpif,g). 
/3tl /3tl 

Similar notation is used in (33| . 


C. Problem formulations 

We are interested in the following two optimization problems. 

Problem 1 (Costly communication) In the model of Section \U-A\ 
given a discount factor /3 G (0,1] and a communication cost X G 
1R>0, find a transmission and estimation strategy (/*, g*) such that 

C*piX):=Cpir,g*-,X)^ mi Cpif,g-X), (8) 

(/>5) 

where 

Cpif,g-,X) ■-= Dpif,g) + XNpif,g) 

is the total communication cost and the infimum in ((8) is taken over 
all history-dependent strategies. 

Problem 2 (Constrained communication) In the model of Sec¬ 
tion \II-A\ given a discount factor P G (0,1] and a constraint 
a G (0,1), find a transmission and estimation strategy (/*, g*) such 
that 


D*p{a)-.= Dp{f*,g*)= inf Dp{f,g), (9) 

{f,9)-N0{f,g)<OL 

where the infimum is taken over all history-dependent strategies. 

^Remark 2 It can be shown for |fi| ^ 1 that liruo,_ D\ ocQ 

and lima-n D*p{a) = 0. 

The function Dp{a), P G (0,1], represents the minimum expected 
distortion that can be achieved when the expected number of trans¬ 
missions are less than or equal to a. It is analogous to the distortion- 
rate function in Information Theory; for that reason, we call it the 
distortion-transmission function. 


III. The main results 


A. Structure of optimal strategies 

To completely characterize the functions Cp{X) and Dp{a), we 
first establish the structure of optimal transmitter and receiver. 


Theorem 1 (Structural results) Consider Problem^jyor P G (0,1]. 
Then, for both Models A and B, we have the following. 


1) Structure of optimal estimation strategy.- The optimal estimation 


or equivalently, 


0 and for t > 0 is 

as follows: 

X. = P': 

ifYtf& 

l^aXt-i, 

II 

A = 

ifUt = l 

|^aXt_i, 

ifUtfl. 


We denote this strategy by g*. 

2) Structure of optimal transmission strategy.- Define Et ■= Xt — 
aXt-i, which we call the error process. Then there exists a 
time-invariant threshold k such that the transmission strategy 


Ut = P‘\Et) 

is optimal. 


1, if\Et\>k 
0, if \Et \ < k 


( 10 ) 


The proof of the theorem is given in Section [V| 

Similar structural results were established for the finite horizon 
setup in [T), |18| , |2Q| , which we use to establish Theorem^ See 


*For |a| > 1, a symmetric Markov chain as given by ^ does not have a 
stationary distribution. Therefore, in the limit of no transmission, the expected 
long-term average distortion diverges to oo. 
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Section 1^ for details. The transmission strategy of the form jlO| l are 
also called event-driven transmission or delta sampling. 

Remark 3 Each transmission resets the state of the error process to 
to G X with probability p™ in Model A and with probability density 
(/I>(u;) in Model B. In between the transmission, the error process 
evolves in a Markovian manner. Thus is a regenerative 

process. 


B. Performance of generic threshold-based strategies 

Let E denote the class of all time-homogeneous threshold-based 
strategies of the form l |10| l. For /3 G (0,1] and e G X, define the 
following for a system that starts in state e and follows strategy 

• L^p\e)’. the expected distortion until the first transmission; 

• the expected time until the first transmission; 

• the expected distortion; 

• the expected number of transmissions; 

• C^p\e\X)’. the expected total cost, i.e., 

cf\e-,\) = Df\e) + \N^f^\e), A > 0. 


Note that Df{Q) = Dp{f^^\g*), Nf\Q) = and 

cjt\0-,X) = Cpif<-'‘\g*-,X). 

Define as follows: 


S(k) _ 


{-(fc- I),--- ,k- 1}, 
{-k,k), 


for Model A; 
for Model B. 


Under strategy the transmitter does not transmit if Et G 
For that reason, we call the silent set. Define linear operator 
as follows: 

• Model A: For any —>■ E., define operator as 


[Z3^''^u](e) := ^ p„-aev{n), Ve G S^'°\ 

• Model B: For any > R, define operator as 

[B^*^''v\{e) := f (j){n — ae)v{n)dn, Ve G 5^*^^ 
JsW 


Recall from Remark that the state Et evolves in a Markovian 
manner until the first transmission. We may equivalently consider the 
Markov process until it is absorbed in (—oo, — fc]u[fc, oo). Thus, from 
balance equation for Markov processes, we have for all e G S^'‘\ 

Lf\e) = d{e)+p[B^'^^Lf^]{e), ( 11 ) 

Mf\e) = 1 (12) 

Lemma 1 For any jd G (0,1], equations o and 03 have unique 
and bounded solutions and that are 

(a) strictly increasing in k, 

(b) continuous and differentiable in k for Model B, 

(c) hmL|j'''(e) = = M[’"\e), for all e. 

The proof of the lemma is given in Appendix [A| 


Theorem 2 (Renewal relationships) For any /? G (0,1], the perfor¬ 
mance of strategy in both Models A and B is given as follows: 

1) Dp(f^°\g*) = 0, Np{f^°\g*) = 1, and Cp{f^°\g*-,X) = 

A. 

2) For k G X>o, 


Dp{f^’‘\g*) = 


L«(0) 


1 




and 


Cp{f'^\g*-,X) 


LfH0) + X 

mW(o) 


A(l-/3). 


The proof of the Theorem is given in Section [Vll 

Remark 4 There is a —1/(1 —/() term in the expression of 
because for k > 0, Uo — 0. Had we defined Uo = 1, then we 
would have obtained the usual renewal relationship of = 

1/M^'''(0). 

Thus, to compute Dp{f^^\g*) and Np{f^^\g*), one needs 
to compute only L^p\o) and Computation of the latter 

expressions is given in the next section. 


Proposition 1 For both Models A and B, 

1) (0; A) is submodular in {k, A), i.e., for I > k, (0; A) — 
Cp\0',X) is decreasing in X. 

2) Let kp (A) = arg inffe>o Cp'^ (0; A) be the optimal kfor a fixed 
X. Then k*p{X) is increasing in X. 

The proof of the proposition is in Appendix [B] 


C. Computation of Lp'^ and Mp'^ 

1) Model A: For Model A, the values of Lp'^ and Mp'' can be 
computed by observing that the operator is equivalent to a matrix 
multiplication. In particular, define the matrix as 

PiP-=pi-F 

Then, 

[Z3^'‘^u](e) = ^ p„-aeV{n) = ^ PpLv{n) = [P^'‘''v]ae. 

nest'") n6S('“) 

(13) 

With a slight abuse of notation, we are using v both as a function 
and a vector. Define the matrix and the vector as follows: 

QP^ := [hk-i - /3P('=>]-\ := [d{-k + l),...,d{k- 1)]U 

Then, jl l| l, ( |12^ and jl3| l imply the following: 

Proposition 2 In Model A, for any f) G (0,1], 

Lp^ = \i2k-i- (14) 

Mp^ = [hk-i - pP^’^'^r^Uk-i. (15) 

See Section [ilI-F| for an example of these calculations. 

2) Model B: For Model B, for any /3 G (0,1], 0 and 0 are 
Fredholm integral equations of second kind |34| . The solution can 
be computed by identifying the inverse operator 

Qp^ = 


which is given by 

pk 

[Qp"'v\{e)= / Rp\e,w,a)v{w)dw, (16) 

J-k 

where for any given a, Rp\-,-;a) is the resolvent of and 
can be computed using the Liouville-Neumann series. See |34| for 
details. Since f is smooth, 0 and 0 can also be solved by 
discretizing the integral equation using quadrature methods. A Matlab 
implementation of this approach is available in |35|. 
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(a) 



(b) 

Fig. 2: In Model A, (a) the optimal costly communication cost C^(A); 
(b) the distortion-transmission function D*p(a). 


Theorem 4 For any p € (0,1] and a G (0,1), define 
k*p{a) = sup{fc G Z>o : Np{f^’^\g*) > a} 
= sup |fc G Z>o : Mp°^ < 

e;ia) = 


and 


1 + a — P. 




M. 


(fc*+i) 


l + a-lz 




(18) 


(19) 


For ease of notation, we use k* = k*p{a) and 9* — 9*p{a). 

Let f* be the Bernoulli randomized simple strategy 

/.e., 


'0, if\e\<k*- 

0, w.p. 1 — 9*, if\e\ = k*-, 
1, w.p. 6»*, i/lel = fc*; 

.1, if\e\>k*. 


( 20 ) 


Then 


1) ts optimal for the constrained Problem^^with con¬ 
straint a. 

2) Let = Np{f^*‘\g*). Then, for a G k* = 

k and 9* = {a —and the distortion- 
transmission function is given by 


D. Main results for Model A 


Dp{a)=9*Df ^-y{l-9*)Df+'^\ (21) 


I) Results for costly communication: 

Theorem 3 For P G (0,1], let K denote {k G Z>o : > 

D^p\o)}. For k„ G K, define: 

^ N^p’’^\0) - N^p^^+^\o)' 

Then, we have the following. 

1) For any fe„ G K and any A G (A^*"“^\A^*^"^], the strategy 
/(""I is optimal for Problem with communication cost A. 

2) The optimal performance Cp{X) is continuous, concave, in¬ 
creasing and piecewise linear in X. The corner points of Cp{X) 
are given by {{X^p’^\ D^p^"\o) -|-(0))}fc„6K (see 
Fig^a)). 

The proof of the theorem is given in Section m 

2) Results for constrained communication: To describe the solu¬ 
tion of Problem]^ we first define Bernoulli randomized strategy and 
Bernoulli randomized simple strategy |36| . 

Deflnition 1 Suppose we are given two (non-randomized) time- 
homogeneous strategies fi and f 2 and a randomization parameter 
9 G (0,1). The Bernoulli randomized strategy (/i, f2,9) is a strategy 
that randomizes between /i and f 2 at each stage; choosing /i with 
probability 9 and f 2 with probability (1 — 0). Such a strategy is called 
a Bernoulli randomized simple strategy if /i and f 2 differ on exactly 
one state, i.e., there exists a state eo such that 

/i(e)=/ 2(e), VeT^eo. 


Moreover, the distortion-transmission function is is continuous, 
convex, decreasing and piecewise linear in a. Thus, the corner 
points of Dp (a) are given by (0), (0))}^i (see 

Fig^b)). 

The proof of the theorem is given in Section lYni 
Corollary 1 In Model A, for any p G (0,1], 

Dp(f^\g*)^0, and Np(f^\g*) = P(l-po)-.= a^. 


E. Main results for Model B 

1) Results for costly communication: Let dkD^^\ dkNp^'* and 
dkCp^^ denote the derivative of D^p\ NpP'^ and ^ with respect to 
k (in Lemmaj^we show that D^p\ Np^'’ and Cp^'’ are differentiable 
in k). 


Theorem 5 For P G (0,1], we have the following. 
1) If the pair {X,k) satisfies the following 


dkDf\0) 

dkN^p'‘\oy 


( 22 ) 


then, the strategy (f^’^\g*) is optimal for Problem^ with 
communication cost X. Furthermore, for any k > 0, there exists 
a X > 0 that satisfies ID- 

2) The optimal performance Cp(X) is continuous, concave and 
increasing function of X. 


The proof of the theorem is given in Section [VIII| Algorithm [T] shows 
how to compute Cp{X). 
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Algorithm 1: Computation of C^(A) 

input : A G^]R>o, P £ (0,1], £ G ]Il>o 
output: ^A), where \k° — fcJ(A)| < £ 

Let A^(fc) denote the left-hand side of \22^ 
Pick k and k such that A^(fc) < A < A^(fe) 
k° = {k + k)/2 
while |A^(fc°) — A| > £ do 
if A*(fc°) < A then 
k = k° 
else 

Lk = k°_ 
k° = {k + k)/2 

return + \Nf°\Q) 


Algorithm 2: Computation of D*p{a) 

input : a G (0,1), /3 G (0,1], e G ]R>o 
output: ^(a), where ^(0) — al < £ 

Pick k and k such that (0) < a < (0) 

k° = {k + k)/2 
while \N^p ^(0) — a| > e do 
if ^ (0) < a then 
k = k° 
else 

Lk = k° 

_k° = {k + k)/2 

return \a) 


2) Results for constrained communication: 

Theorem 6 For any /? G (0,1] and a G (0,1), let kp{a) G R>o be 
such that 

= a. (23) 

Such a kp{a) always exists and we have the following: 

1) The strategy is optimal for Problem with 

constraint a. 

2) The distortion-transmission function Dp (a) is continuous, con¬ 
vex and decreasing in a and is given by 

D}{a) ^ (24) 

The proof of the theorem is given in Section |VIII| Algorithm 
shows how to compute Dp (a). 

3) Special case of Model B-Gauss-Markov model: In general, 

the optimal thresholds, and the functions Cp{X) and Dp{a) depend 
on the noise distribution For the Gauss-Markov model, the 

dependence on the variance of the noise may be quantified exactly. 

For ease of notation, we drop the dependence on /3 from the 
notation, and instead, show the dependence on a. Thus, C^{X) 
denotes the optimal value for the costly communication case when 
the noise variance is a^. Similar notation holds for other terms. 

Theorem 7 For the Gauss-Markov model for Problem^ k%{X) = 
kl{X/a^a^) and C^{X) = cr^Ci (A/ct^). For Problem^ k%{a) = 
(7fc*(a) and D*{a) = a^Di{a). 

The proof of the theorem is given in Section [VIII| 

An implication of the above theorem is that we only need to 
numerically compute Ci (A) and D*{a), which are shown in Fig.[^ 
The optimal total communication cost and the distortion-transmission 



a 


(b) 

Fig. 3: Gauss-Markov model (a^ = 1 and a = 1): (a) optimal 
costly communication cost (b) distortion-transmission func¬ 

tion Dl{a). 


function for any other value can be obtained by simply scaling 
C* (A) and D^ (a) respectively. 

F. An example for Model A: symmetric birth-death Markov chain 

An example of a Markov process and a distortion function that 
satisfy Model A is the following: 

Example 1 Consider a Markov chain of the form Q where the pmf 
of Wt is given by 

{p, = l 

Pn = < 1 — 2p, if n — Q 
otherwise, 

where p G (0, |). The distortion function is taken as d{e) = \e\. 

This Markov process corresponds to a symmetric, birth-death 
Markov chain defined over Z as shown in Fig. with the transition 
probability matrix is given by 

(p, if|i-j|=l; 

Pij = ll- 2p, if i = j; 

1 0, otherwise. 


1 — 2p 1 — 2p 1 — 2p 1 — 2p 1 — 2p 
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TABLE I: Values of D^p\ and for different values of k and j3 for the Markov chain of Example [I| with p = 0.3. Note that 

_ _ f 1 ^ . - __ n _ I—I 


r(o) 

— Dp\o)\ therefore K defined in 

Theorem jl 

j equals Z>o. 








(a) For P = 0.9 




(b) For P = 0.95 




(c) For /3 = 1.0 


k 

Df\0) 

Nf\0) 

Ok) 

^B 


k 

Df{0) 

Nf\0) 

Ok) 

^B 


k 

D«(0) 


Ok) 

^B 

0 

0 

1 

- 


0 

0 

1 

- 


0 

0 

1 

- 

1 

0 

0.5400 

1.0989 


1 

0 

0.5700 

1.1050 


1 

0 

0.6000 

1.1111 

2 

0.4576 

0.1236 

4.1021 


2 

0.4790 

0.1365 

4.3657 


2 

0.5000 

0.1500 

4.6667 

3 

0.7695 

0.0475 

9.2839 


3 

0.8282 

0.0565 

10.6058 


3 

0.8889 

0.0667 

12.3810 

4 

1.0066 

0.0220 

16.2509 


4 

1.1218 

0.0288 

19.9550 


4 

1.2500 

0.0375 

25.9259 

5 

1.1844 

0.0111 

24.4478 


5 

1.3715 

0.0163 

32.0869 


5 

1.6000 

0.0240 

46.9697 

6 

1.3130 

0.0058 

33.4121 


6 

1.5811 

0.0098 

46.4727 


6 

1.9444 

0.0167 

77.1795 

7 

1.4029 

0.0031 

42.8289 


7 

1.7536 

0.0061 

62.5651 


7 

2.2857 

0.0122 

118.2222 

8 

1.4638 

0.0017 

52.5042 


8 

1.8927 

0.0039 

79.8921 


8 

2.6250 

0.0094 

171.7647 

9 

1.5040 

0.0009 

62.3245 


9 

2.0028 

0.0025 

98.0854 


9 

2.9630 

0.0074 

239.4737 

10 

1.5298 

0.0005 

72.2255 


10 

2.0884 

0.0016 

116.8739 

10 

3.0000 

0.0060 

323.0159 

3 





3 - 




3 





_ 2 


^ = 0.9 



2 


P = 0.95 


_ 2 



^ = 1.0 
















Q 




Q 





TO 





1 





1 




1 






0.2 

0.4 0.6 

0.8 1 



0.2 

0.4 0.6 

0.8 1 



0.2 0.4 

0.6 

0.8 1 



a 





a 





a 



(a) D*p(a) 

vs a for p 

= 0.9 



(b) D*(«) 

vs a for p = 

0.95 



(c) D*^{a) vs 

a for p = 

1.0 


Fig. 5: Plots of D*p{a) vs a for different P for the birth-death Markov chain of Examplewith p = 0.3. 


I) Performance of a generic threshold-based strategy: 
Lemma 2 1) For /3 £ (0,1), 


Df\0) 

iV«(0) 


sinh(fcm/ 3 ) — fcsinh(m/ 9 ) 

2 sinh^(fcm, 3 / 2 ) sinh(m/ 3 ) ’ 
2/3psinh^(m/3/2) cosh(fcm/3) 
siniP {kmp /2) 


2) For P = l, 


dW 


- 1 . jaW ^ 

3fc ’ 1 fe2 ’ 


and 

\(fc) _ k{k + l){k^ + /c + 1) 

^ 6p(2k + 1) 



Fig. 6: Plot of Cp{X) vs A for the Markov chain of Example with 
p = 0.3. 


The proof is given in Section [TX| 

2) Optimal strategy for costly communication: Using the above 
expressions for (0) and (0), we can identify K and for each 
kn £ K, compute according to ijTTJ. These values are tabulated 
in Table|^for different values of /3 (all for p — 0.3). Using Table|I] we 
can compute the corner points (Ap‘^\ Dp^'^^ (0) + Ap°"^Np^^\o)) of 
Cp(X). Joining these points by straight lines gives Cp{X), as shown 
in Fig. 1^ The optimal strategy for a given A can be computed from 

Table nr 

For example, for A = 20, P = 0.9, we can find from Table [I^ that 
A € (A^^\a^®^]. Hence, kp = 5 (i.e., the strategy is optimal) 
and the optimal total communication cost is 

(70*9(20) = dI,%0)+20N^^^{0) = 1.1844+20x0.0111 = 1.4064. 


3) Optimal strategy for constrained communication: Using the 
values in Table |I] we can also compute the comer points 
{Np°''(0), D^p \0)) of ^{a). Joining these points by straight lines 
gives Dp{a) (see Fig. pi. The optimal strategy for a given a can 
be computed from Table ^ For example, at a = 0.1 and P = 0.9, 
k*p{a) is the largest value of k such that Np^\o) > a. Thus, from 
Table [I^ we get that k* = 2. Then, by l |^ . 


e* 


a-N, 


(3) 

B 




0.6899. 


Let /* = Then the Bernoulli randomized simple 

strategy is optimal for Problem]^ for p £ (0,1). Further¬ 

more, by |2^, D*p{a) = 0.5543. 



























IV. Salient features and discussion 
A. Comparison with periodic and randomized strategies 

In our model, we assume that the transmission decision depends 
on the state of the Markov process. In some of the remote estimation 
literature, it is assumed that the transmission schedule does not 
depend on the state of the Markov process. Two such commonly 
used strategies are: 

1) Periodic transmission strategy with period T: 

Ut = fp{t mod T), 


where fpi^) = 

2) Random transmission strategy: 


Ut = 



w.p. a 
w.p. 1 — a. 


Below, we compare the performance of the threshold-based strategy 
with these two strategies for the for the long-term average setup for 
Problem 1^ for Model B with o = 1. 

1) Performance of the periodic strategy: In general, the perfor¬ 
mance of a periodic transmission strategy depends on the choice of 
transmission function fp. For ease of calculation we consider the 
values of (a, T) for which fp is unique. 

1) a = 1/r, T £ Z>o, i.e., the transmitter remains silent for 
(T — 1) steps and then transmits once. The expected distortion 
in this case is 


= --£[^2 Et ] 

t^O 

T-1 


T 


1 (T- 1)T^2 
T 2 ^ 


1). 


where (a) uses Et = Wo + Wi + W 2 -b • • • -b Wt-i- 
2) a = {T — 1)/T, T € Z>o, i.e., the transmitter remains silent 
for 1 step and then transmits for {T — 1) steps. The expected 
distortion in this case is 

i3per(a) = iE[£?] = ^=a^(l-a). 


2) Performance generic stationary transmission strategy: Next, 
we derive an expression of Dp{f,g*) for arbitrary stationary trans¬ 
mission strategy / (that does not use the value of the state Et 
to determine when to transmit; so the receiver is the same as in 
Theorem[TJ for the long-term average setup for Model B when a = 1. 


Proposition 3 For /3 = 1 and a = I in Model B, let f be an 
arbitrary stationary transmission strategy. Let r denote the stopping 
time of the first transmission under f. Then 

E(r2) 




Proof: For any t < t, Et = W§ -b • • • -b Wt-i- Therefore, 
E[i?^] = and define L{t) = ~ Now, 

Li(0) = E[L(t)] = (crV2)[E(T2) - E(t)] and Mi(0) = E(r). 
By using the same argument as in the proof of Theorem we get 
Di{f,g*) — Li(0)/Mi(0), which implies the result. ■ 

3) Performance of randomized transmission strategy: For the ran¬ 
domized strategy defined above, r is a Geomi(Q:) random variable. 
Therefore, E(r^) = 2/a^ — l/a and E(r) = 1/a. Hence, following 
Proposition]^ we have 

Aand(a) = — l]- 

a 


Fig. □ shows that threshold-based startegy performs considerably 
well compared to the periodic transmission strategy and the random¬ 
ized transmission strategy. 



Fig. 7: Comparison of the performances of the threshold-based 
startegy (denoted by Dopt) with periodic and randomized transmission 
strategies (denoted by Dpa and Hiand, respectively) for a Gauss- 
Markov process with a — 1 and = 1. 


B. Discussion on deterministic implementation 

The optimal strategy shown in Theorem chooses a randomized 
action in states {—k*, k*}. It is also possible to identify deterministic 
(non-randomized) but time-varying strategies that achieve the same 
performance. We describe two such strategies for the long-term 
average setup. 

1) Steering strategies: Let a? (respectively, at) denote the number 
of times the action ut = 0 (respectively, the action ut = 1) has been 
chosen in states {—k*,k*} in the past, i.e. 


t-i 

a] = = k*, Us = i}, { 0 , 1 }. 

Thus, the empirical frequency of choosing action Ut = i, i (2 {0,1}, 
in states {—k*,k*} is a\/{a!l + aj). A steering strategy compares 
these empirical frequencies with the desired randomization probabil¬ 
ities 6^ = 1 — 9* and 9^ = 9* and chooses an action that steers the 
empirical frequency closer to the desired randomization probability. 
More formally, at states {—k*, k*}, the steering transmission strategy 
chooses the action 


arg min 

i 


al + 1 

“t + “t + 1 


} 


in states {—k*,k*} and chooses deterministic actions according to 
/* (given in j20[ l) in states except {—k*,k*}. Note that the above 
strategy is deterministic (non-randomized) but depends on the history 
of visits to states {—fc*, k*}. Such strategies were proposed in p7) , 
where it was shown that the steering strategy descibed above achieves 
the same performance as the randomized startegy /* and hence is 
optimal for Problem|^for /? = 1. Variations of such steering strategies 
have been proposed in | |38| , |39| , where the adaptation was done 
by comparing the sample path average cost with the expected value 
(rather than by comparing empirical frequencies). 

2) Timesharing strategies: Define a cycle to be the period of 
time between consecutive visits of process {Et}t^o to state zero. 
A time-sharing strategy is defined by a series {(am,&m)}m=o snd 
uses startegy ' for the first oq cycles, uses startegy for 

the next bo cycles, and continues to alternate between using startegy 
^ for Um cycles and strategy for bm cycles. In particular, 

if (am,bm) ~ {a.,b) for all m, then the time-sharing strategy is a 
periodic strategy that uses ^ a cycles and for b cycles. 
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The performance of such time-sharing strategies was evaluated in 
|40| , where it was shown that if the cycle-lengths of the time-sharing 
strategy are chosen such that, 

E™=o(a'" + bm) -f (1 - 

1 

a 

then the time-sharing strategy {(am,fom)}m=o achieves the same 
performance as the randomized strategy /* and hence, is optimal 
for Problem]^ for /5 = 1. 


V. Proof of the structural result: TheoremQ 
A. Finite horizon setup 

A finite horizon version of Problem [T] has been investigated in JT) 
(for Model A) and in (T^, (20) (for Model B), where the structure 
of the optimal transmission and estimation strategy was established. 


Theorem 8 ^18% ^0 i For both Models A and B, for a finite 

horizon version of Problem B we have the following. 

1) Structure of optimal estimation strategy.- the estimation strategy 
defined in Theorem [7] is optimal. 

2) Structure of optimal transmission strategy; define Et as in 

Theorem^^ Then there exist threholds such that the 

transmission strategy 


Ut := MEt) 


is optimal. 


1 , if\Et\>kt; 
0, if \Et\ < kt 


(25) 


The above structural results were obtained in (T] Theorems 2 and 3] 
for Model A and in |18| Theorem 1] and |20| Lemmas 1, 3 and 4] 
of Model B. 


Remark 5 The results in (T) were derived under the assumption that 
{Wt} has finite support. These results can be generalized for {Wt} 
having countable support using ideas from Q. For that reason, we 
state Theorem [^without any restriction on the support of {Wt}. See 
the supplementary document for the generalization of jT] Theorems 2 
and 3] to {Wt} with countable support. 


B. Infinite horizon setup 

In a general real-time communication system, the optimal esti¬ 
mation strategy depends on the choice of the transmission strategy 
and vice-versa. Theorem shows that when the noise process and 
the distortion function satisfy appropriate symmetry assumptions, 
the optimal estimation strategy can be specified in closed form. 
Consequently, we can fix the estimation strategy to be of the above 
form and consider the optimization problem of identifying the best 
transmission strategy. This optimization problem has a single decision 
maker—the transmitter—and we use techniques from centralized 
stochastic control to solve it. Since the optimal estimation strategy 
is time-homogeneous, one expects the optimal transmission strategy 
(i.e., the choice of the optimal thresholds {kt}t^o) to be time- 
homogeneous as well. The technical difficulty in establishing such a 
result is that the state space is not compact and the distortion function 
may be unbounded. 

To prove Theorem [T] we proceed as follows: 

1) We show that the result of the theorem is true for /3 € (0,1) 
and the optimal strategy is given by an appropriate dynamic 
program. 

2) We show that for the discounted setup, the value function of 
the dynamic program is even and increasing on X. 


3) For /3 = 1, we use the vanishing discount approach to show 
that the optimal strategy for the long-term average cost setup 
may be determined as a limit to the optimal strategy for the 
discounted cost setup is the discount factor f 1. 

1) The discounted setup: 

Lemma 3 In Model A. an optimal transmission strategy is given by 
the unique and bounded solution of the following dynamic program: 
for all e G Z, 

V'/ 3 (e; A) = min |^(1 - /3)A + /3 ^ pn,Vp{w\ A), 

(1 -/3)d(e)-I-,5 ^ Pu,V/ 3 (ae-I-w; A)j. (26) 

Proof: When d(-) is bounded, the per-step cost c(e,u) := 
(1 —/3)[Aw + d(e)(l — u)], u G {0,1}, for a given A is also bounded 
and hence according to |42[ Proposition 4.7.1, Theorem 4.6.3], there 
exists the unique and bounded solution Vi}{e\\) of the dynamic 
program ( |26) . 

When d(-) is unbounded, then for any communication cost A, we 
first define eo G Z>o < oo as: 

eo := min|e : d(e) > 

Now, for any state e, |e| > eo, the per-step cost (1 — /3)d(e) of 
not transmitting is greater then the cost of transmitting at each step 
in the future, which is given by (1 — /3) EEo ~ Thus, the 
optimal action is to transmit, i.e., /*(e) = 1. Hence, the dynamic 
program can be written as 

V) 3 (e;A) =min{V>“(e;A),V>'(e; A)}, 


where 

Va°(e; A) = (1 - ^)d(e) + /3 ^ PwVp^ae + w; A), 

uj6Z 

Let £* := {e : \e\ > eo}. Then, for all e G £*, V/ 3 {e-, X) is 
constant. Thus, \16\ is equivalent to a finite-state Markov decision 
process with state space {—eo -|- 1, • ■ • , eo — 1} U e* (where e* is 
a generic state for all states in the set £*). Since the state space 
is now finite, the dynamic program \26) has a unique and bounded 
time-homogeneous solution by the argument given for bounded d(-). 


Lemma 4 In Model B, an optimal transmission strategy is given by 
the unique and bounded solution of the following dynamic program: 
for all e G R, 

Vp{e\ X) = min [(1 - /3)A + /3 /" il>{w)Vp{w, X)dw, 

L in 

{1 - P)d{e) + P J (j){w)Vp{ae + w,X)dw^. (27) 

Proof: When d(-) is bounded, the per-step cost c{e,u), as 
defined in part (a), for a given A is also bounded. Let K = (1 — 
/3) supgg][^{d(e)}. Then, the strategy ‘always transmit’ satisfies (43 1 
Assumption 4.2.2] with Vp{e\ X) < K/{l—p). Also, A, d(-) and 
satisfy |43| Assumption 4.2.1]. Hence, the above dynamic program 
has a unique and bounded solution due to |43| Theorem 4.2.3]. 

When d(-) is unbounded, define eo and e* as in the proof of 
Lemma By an argument similar to that in the proof of Lemma 
we can restrict the state space of <27} to [— eo,eo] U e*. Hence, the 
state space is compact and on this state space df) is bounded. Thus, 
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the dynamic program has a unique and bounded solution by the 
argument given for bounded d(-). ■ 

Proof of Theorem^^or fi £ (0,1); The structure of the optimal 
strategies follows from Theorem]^ The optimal thresholds are time 
invariant because the corresponding dynamic programs \26) and \21) 
have a unique fixed point. ■ 

2 } Properties of the value function: 

Proposition 4 For any a £ X>o, consider the two Markov processes 
and such that = 0 and 

Xt'+l = + Wt and = -aX^~^ + Wt. 

Let and Vp ^ be the value functions corresponding to 

and Then 

V^+He) = V^-\e), Ve. 

Therefore, if k is an optimal threshold for then k is also 

optimal for 

See Appendix for the proof. 

Remark 6 As a consequence of the above proposition, we can 
restrict attention to a > 0 while proving the properties of the value 
function Vpf). 

Proposition 5 For any A > 0 and /? £ (0,1), the value functions 
V'/sC-; A) given by \26) and are even and increasing on X>o. 

See Appendix [C| for the proof. 

3) The long-term average setup: 

Proposition 6 For any A > 0, the value function Vpf^X) for 
Models A and B, as given by l |26^ and respectively, satisfy the 
following SEN conditions of 

(51) There exists a reference state eo G X and a non-negative scalar 
M\ such that Vp{eo, A) < M\ for all f3 £ (0,1). 

(52) Define hi 3 {e\\) = (1 — /3)“^[l/aCe; A) — l// 3 (eo;A)]. There 
exists a function K\ : Z —> E, such that hp{e\\) < K\{e) 
for all e £ IL and £ (0,1). 

(53) There exists a non-negative (finite) constant L\ such that 
— Lx < hp(e\ A) for all e £ JL and £ (0,1). 

Therefore, if fg denotes an optimal strategy for £ (0,1), and /i 
is any limit point of {fp}, then /i is optimal for 13 = 1. 

Proof: Let (e, A) denote the value function of the ‘always 
transmit’ strategy. Since Vp(e,\) < Vp^\e,\) and Vp^\e,\) = A, 
(SI) is satisfied with Mx = A. 

We show (S2) for Model B, but a similar argument works for 
Model A as well. Since not transmitting is optimal at state 0, we 
have 

/ OO 

(f){w)Vp{w, X)dw. 

-OO 

Let V^\e, A) denote the value function of the strategy that transmits 
at time 0 and follows the optimal strategy from then on. Then 

/ OO 

(j){w)Vp{w,\)dw 

■ OO 

= (l-/3)A + /3Vfl(0,A) (28) 

Since Vp(e,\) < A) and Va(0, A) > 0, from l (^ we get 

that (1 — P)~^\Vp{e,\) — V)3(0,A)] < A. Lienee (S2) is satisfied 
with Kx{e) = A. 

By Proposition]^ A), hence (S3) is satisfied with 

Lx = 0. m 


Proof of Theorem [^/or = 1; Since the value function 

Vp (■, A) satisfies the SEN conditions for reference state eo = 0, 
the optimaity of the threshold strategy for long-term average setup 
follows from Theorem 7.2.3] for Model A and Theorem 
5.4.3] for Model B, respectively. ■ 


VI. Proof of Theorem[2] 

A. Preliminary results 

Define operator B as follows: 

• Model A: For any u : Z —>■ R, define operator B as 


OO 

[Bv]ie) ■-= Y, Puiv{ae 3-w), \/e£'E. 

w = — oo 

Or, equivalently. 


[Bv\(e) ■— Yj Pn-aev{n), Vc G Z. 

Tl= —OO 

• Model B: For any bounded u : R —> R, define operator B as 

[Bv\{e)'■= / (f)(w)v{ae-\-'w)dw, Ve G R. 

Jn 

Or, equivalently, 

\Bv\{e) ■= / (f>{n — ae)v{n)dn, Ve G R. 

Ju 


As discussed in Remark]^ the error process {i7t}“o is a con¬ 
trolled Markov process. Therefore, the functions and Np^^ may 
be thought as value functions when strategy is used. Thus, they 
satisfy the following fixed point equations: for £ (0,1), 


DW(e) 


/3[eDW](0), if|e|>fc 

(l-/3)d(e) + /3[SR«l(e), if |e| < fc, 

(1 - P) + PIbY^^KO), if|e|>A: 

/?[BfV«](e), if|e|<fc. 


Lemma 5 For P £ (0,1], \29) and P0[ l have unique and bounded 
solutions D^p\e) and Np^\e) that 

1) are even and increasing (on X>o) in e for all k, 

2) satisfy the SEN conditions (see Proposition^ and therefore 

D^\e) =\miD^i^\e) and x|*’Ve) = limWi^’Ve). 

J- ' ^ fl-M P ' ' 1 \ P ^ : 


3) D^p\e) is increasing in k for all e and N^^\e) is strictly 
decreasing in k for all e. 


The proofs of 1) and 2) follow from the arguments similar to those 
of Section [V] and are therefore omitted. The proof of 3) is given 
in Appendix |D| 


B. Proof of Theorem 

We prove the result for the discounted cost setup, p £ (0,1). The 
result extends to the long-term average cost setup, /3 = 1, by using 
the vanishing discount approach similar to the argument given in 
Section lYl 

We first consider the case A: = 0. In this case, the recursive 
definition of D^p'‘ and N^p\ given by l [^ and l [^ , simplify to 
the following: 

Df{e) = P[BD^YiQ)- 

and 

Nf\e) = {l-p) + p[BNf\Q). 


II 


k^X) 

k2 - 


-o 


ki 

ko 




A 


(ko) 

/3 





Proof of l |^ ; For any A € {X^p" C^*^"^(0;A) < 

C'^^’*'''i)(0; A). In particular, for A = 

^(fe.)(0; < c'""+^\0; Xf"^). (36) 

Similarly, for any A € (A^*"\ A) < 

(0; X). Since both terms are continuous in A, taking limit as 
A I X^p''\ we get 


Fig. 8: Plot of fc^(A) for Model A. 


C^g- + ^\0-,Xf-^)<C^g’=-\0-,X^g^’^'>). 


(37) 


It can be easily verified that D® (e) = 0 and Ng^'' (e) = 1, e G X, 
satisfy the above equations. Also, Cg^^ (e; A) = C/s ,g*;X) = A. 
This proves the first part of the proposition. 

For k > 0, let denote the stopping time when the Markov 
process in both Model A and B starting at state 0 at time t = 0 
leaves the set Note that = 1 and t^°°^ = oo. 

Then, 


L«(0) = e[ X] /3*d(Et) I Flo = O] 

t-0 

rW-1 

= e[ ^ / 3 * I Eo = o] = — 


Eq. ® follows from combining ( |36| l and l |37[ (. 

J) Proof of Part 1): By definition of Xg^, the strategy /('''*) is 
optimal for A G (A^*’"“^\A^*"^]. 

2) Proof of Part 2): Recall Cg{X) = inffe>o (0; A). By 
definition, for A > 0, Cg^\0',X), is increasing and affine in A. 
Therefore, its pointwise minimum (over k) is increasing and concave 
in A. 

As shown in part 1), for A G (A^*’*^ A^*^"’*'^^], 
(0; A), which is linear (and continuous) 


r('=)_l 


Substituting ( |31^ and in we get 

Df\Q) = (1 - /3)lW(0) + [!-(!- fi)Mf\Q)\Df\0). 

Rearranging, we get that 



(31) 

in A; hence, Cg(X) is piecewise linear. Finally, by l|35[>, 
C'^''"^(0; A^'""^) = C^''"+"’(0;A('“"i). Therefore, at the corner 

E)/?"'"’ 1 Eo 

= 0] 

points, lim Cg(X) = lim Cg(X). Hence, Cg(X) 

A\A ^ A4.A^ 

is continuous in A. 

1-/3 

(32) 

■“'d<«(0 ) I 

' o 

II 

B. Proof of Theorem 

Note that by definition, 9* G [0,1] and 


(33) 

0*Ng{f^'^*\g*) + (1 - r)/V^(/('=*+!),/) = a. (38) 

o 

II 

o 

(34) 

]) Proof of Part 1): The optimality of (/*,<?*) relies on the 
following characterization of the optimal strategy stated in |44[ 




0^(0) = 


Mf\Q) 


Similarly, substituting ([STJ and ( |32| l in l |34[ > we get 

Nf\o) = [!-(!- -P)+K'^m- 

Rearranging, we get that 

-(1-/3). 


Nf{Q) = 


The expression for C^\0; A) follows from the definition. 

VII. Proofs of results for Model A 
A. Proof of Theorem 

By Proposition 1 1 kg{X) = arginffe>o Cg^\0; A) is increasing in 
A. Let K denote the set of all possible values of kg{X). Since k is 
integer-valued, the plot of kg vs A must be a staircase function as 
shown in Fig. In particular, there exists an increasing sequence 
{Xf"^}k„eK such that for A G A^''"^], k*g{X) = fe„. We 

will show that for any fe„, 


Mjt'> iO) 


^(fc„)(0;A(,'=")) = C7'''"+i\0;A^'=">). 

Simplifying nn, we get that X^g"'' is given by 03- 


(35) 


Proposition 1.2]. The characterization was stated for the long-term 
average setup but a similar result can be shown for the discounted 
case as well, for example, by using the approach of |45| . Also, see 
|46[ Theorem 8.4.1] for a similar sufficient condition for general 
constrained optimization problem. 

A (possibly randomized) strategy {f°,g°) is optimal for a con¬ 
strained optimization problem with /3 G (0,1] if the following 
conditions hold: 

(Cl) Np{r,g°) = a, 

(C2) There exists a A° > 0 such that {f°,g°) is optimal for 
Cg(f,g-,X°). 

We will show that the strategies {f*,g*) satisfy (Cl) and (C2) with 

x° = xf\ 

if*,g*) satisfy (Cl) due to |^. For A = Xg^ \ both f^'‘ ^ and 
f(.k + 1 ) optimal for Cg(f, g; A). Hence, any strategy randomizing 
between them, in particular /*, is also optimal for Cg (/, g\ A). Hence 
{f*,g*) satisfies (C2). Therefore, by Proposition 1.2], {f*,g*) 
is optimal for Problem]^ 

2) Proof of Part 2): The expression of k* and 6* follow directly 
from and OH- The form of Dg{a) given in l |21| follows 
immediately from the fact that {f*,g*) is a Bernoulli randomized 
simple strategy. 

Dg{a) is the solution to a constrained optimization problem 
with the constraint set {(/,<?) : Ng{f,g) < a}. Therefore, it 
is decreasing and convex in the constraint a. The optimality of 
if*, 9*) implies Piecewise linearity of Dg{a) follows from 

0 - Finally, by definition of and 9, lim^.^^{fc) D*gia) = 
Df\0) = lim^^„(fc) Dg{a). Hence, Dg{a) is continuous in a. 












12 


VIII. Proofs of results for Model B 
Lemma 6 In Model B, for /3 G (0,1], 

1) and are continuous in k, 

2 ) is strictly decreasing in k, 

3) D^p \ and are differentiable in k. 

Proof: The proof follows from Lemmaand Theorem]^ 


Proof: Define L^\e) := . 


{' 


/ k 

-I 


Now consider, 
(j){n — ae)L^a\n)dn, Ve G R 


(j>{z — aela)L{'‘^'^\z)dz 

— kfcT 


A. Proof of Theorem 

1) Proof of Part 1): The choice of A implies that dkC^^\0\ A) = 
0. Hence strategy {f^'‘\g*) is optimal for the given A. 

Note that, (|22^ can also be written as A = 

and by Lemma 5 dkD^p\o) > 0. Hence, for any fc > 0, A given 
by l |22[ > is positive. This completes the first part of the proof. 

2) Proof of Part 2): The monotonicity and concavity of Cp{X) 
follows from the same argument as in Model A. 

Note that fc^(A) = arginffc>o A) can take a value oo 

(which corresponds to the strategy ‘never communicate’). Thus, 
the domain of k is X>o U {oo}, which is a compact set. Now, 
~ niinj,g[o,cx)] A), where Cp^\o-,X) is continuous 

in both A and k. Since, C'^(A) is pointwise minimum of bounded 
continuous functions, where the minimization is over a compact set, 
it is continuous. 


B. Proof of Theorem 

1) Proof of Part 1): Recall conditions (Cl), (C2), given in Sec¬ 

tion [Vn^ for a strategy to be optimal for a constrained optimization 
problem. We will show that for a given a, there exists a k*p{a) G R>o 
such that 5*) satisfy conditions (Cl) and (C2). 

By Lemma 1^ is continuous and strictly decreasing in k. 

It is easy to see that limfe_>o (0) = 1 and limfe_joo (0) = 
0. Hence, for a given a G (0,1), there exists a kp{a) such that 

^^fe;(<^))( 0 ) ^ iV^(/('“?= a. Thus, satisfies 

(Cl). 

Now, for kp{a), we can find a A satisfying \22) and hence 
we have by Theorem that strategy is optimal for 

Cpif, (?; A), and therefore satisfies (C2); and is consequently optimal 
for Problem 12 

2) Proof of Part 2): By Lemma N{k) := Np^^O) is strictly 
decreasing and continuous in k. Therefore, N~^ exists and is 
continuous. Now, 




min D^g\o), 

{fc : fc<iV-l(a)} ^ 


where, by Lemma 6 D^p\o) is continuous in k. Thus, by Berge’s 
maximum theorem)~Dj(a) is continuous in a. 


C. Proof of Theorem 

To prove the theorem, we first need to prove the following lemma. 

Lemma 7 For Gauss-Markov model (a special case of Model B), let 
and be the solutions of o and 0 respectively, when 


the variance ofWt is a^. Then 





(39) 


iVi''He) = (1) . 

(40) 


where (a) uses a change of variables n = az. Therefore, 

[lS'') - (e) = a" 0) 


2 e 2 

= cr ^ = e . 


But, by Lemmathe above equation has a unique solution . 


Therefore = Lj) 

A similar argument may be used to prove the scaling of M^\ 
The scaling of and follow from Theorem]^ ■ 

Proof of Theorem 0 The theorem follows from Lemma [7] Theo¬ 
rem and elementary algebra. 


Hk) 


IX. Prooes of results for Example[T] 
Lemma 8 Define for fi G (0,1] 

Kr = —2 — and rriR = cosh~^ {—Kn/2) 

PP 


Then, 




" “ /3p bf> 

where, for G (0, 1), 

[^^p'’]i 3 = cosh((2fc -\i- j\)mp) - cosh((i -f j)m; 9 ), 
b^p'^ = sinh(m; 3 ) sinh(2fcm^); 

and for /? = 1, 

= (fe-max{i,j})(fc-|-min{i,}}), 




2k. 


In particular, the elements cit-e given as follows. For fi G 

( 0 , 1 ), 

rn('')l . - 1 cosh(( 2 fc-|j|)m; 3 )-cosh(jm^) 

^ Pp 2 sinh(m/ 9 ) sinh( 2 fcm/ 3 ) ’ 


and for /3 = 1, 


[Q^i\3 = 


k-\ 


2p 


(42) 


Proof: The matrix l 2 k-i — PP^'^^ is a symmetric tridiagonal 
matrix given by 


hk-i - PP^’^^ = -Pp 


Kp 

1 

0 


1 0 
Kp 1 
1 Kb 


0 

0 0 


1 Kp 

0 1 


Kp. 


Qp^^ is the inverse of the above matrix. The inverse of the tridiagonal 
matrix in the above form with Kp < —2 are computed in closed form 
in (47). The result of the lemma follows from these results. ■ 
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A. Proof of Lemma^ 

By substituting the expression for from Lemma in the 

expressions for and from Propositionwe get that 

1) For P G (0,1), 


L«(0) 

M«(0) 
2) For P = l, 


sinh(fcm/ 3 ) — A:sinh(m, 3 ) 
4/3psinh^(m/3/2) sinh(m^) cosh(fcm; 3 ) ’ 
sinh^(A;m;9/2) 

2pp sinh^ (m/ 3 /2) cosh(fem/ 3 ) 


= k{k^ - l)/(6p), Mf>(0) = fcV(2p). 


The results of the lemma follow using the above expressions and 
Theorem The expression for is obtained by plugging the 
expressions of d['°\ and in ([TT}. 

X. Conclusion 

We characterize two fundamental limits of remote estimation of 
autoregressive Markov processes under communication constraints. 
First, when each transmission is costly, we characterize the minimum 
achievable cost of communication plus estimation error. Second, 
when there is a constraint on the average number of transmissions, 
we characterize the minimum achievable estimation error. 

We also identify transmission and estimation strategies that achieve 
these fundamental limits. The structure of these optimal strategies 
had been previously identified by using dynamic programming for 
decentralized stochastic control systems. In particular, the optimal 
transmission strategy is to transmit when the estimation error process 
exceeds a threshold and the optimal estimation strategy is to select 
the transmitted state as the estimate, whenever there is a transmission. 
We use ideas based on renewal theory to identify the performance 
of a generic strategy that has such a structure. For the case of costly 
communication, we identify the value of communication cost for 
which a particular threshold-based strategy is optimal; for the case 
of conshained communication, we identify (possibly randomized) 
threshold-based strategies that achieve the communication constraint. 

These results are derived under idealized assumptions on the 
communication channel: communication is noiseless and without any 
constraint on the transmission rate or the transmission bandwidth. Un¬ 
der these assumptions, the error process resets after each transmission 
(see Remark]^. This reset property is critical to derive the structure 
of optimal transmission and estimation strategies (Theoremsand [^. 
In the absence of such a structural result, the solution methodology 
developed in this paper does not work and the optimal transmission 
and estimation strategies have to be identified by numerically solving 
the (decentralized) dynamic programs described in © © 

Having said that, the transmission and estimation strategies de¬ 
scribed in Theorems [T] and may be used as heuristic sub-optimal 
strategies when the communication channel does not satisfy the 
idealized assumptions described above. In that case, it may be 
possible to use the solution methodology developed in this paper 
to obtain performance bounds on such strategies. 

A similar remark holds for multi-dimensional autoregressive pro¬ 
cesses. It is reasonable to expect (although we are not aware of a proof 
of this statement) that for multi-dimensional autoregressive processes, 
the optimal estimation strategy will be similar to that described in 
Theorems and while the optimal transmission strategy will be 
to transmit when the error process lies outside a (multi-dimensional) 
ellipsoid. The performance of such strategies can be evaluated using 
the solution methodology developed in this paper. The renewal 
relationships derived in Theorem also hold for multi-dimensional 


autoregressive processes. The only difference is that L^p\o) and 
are computed by solving multi-dimensional Fredholm inte¬ 
gral equations of the second kind. The optimal transmission strategies 
can then be computed by solving multi-dimensional versions of 
(for costly communication) and j23[ > (for constrained communica¬ 
tion). However, it is not immediately clear whether these equations 
will have a unique solution. Further investigation is required to obtain 
algorithms that identify the optimal transmission ellipsoid. 

Finally, the solution methodology developed in this paper to 
identify optimal thresholds is also of independent interest. In various 
applications of Markov decision processes threshold strategies are 
optimal. The approach developed in this paper is directly applicable 
to such models. 
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Appendix A 
Proof of LemmaQ 

Let II • IIoo denote the sup-norm, i.e., for any v : —>■ R, 

||f||oo = sup |u(e)|. 

To prove the lemma, let us first prove the following: 

Lemma 9 For P G (0,1), for both Models A and B, the operator 
PB^^^ is a contraction, i.e., for any v : ^ R, 

||/9H^''^t;||oo < /3||p||oo. 

Thus, for any bounded h : —> R, the equation 

V = h + PB^*^\ (43) 

has a unique bounded solution v. In addition, if h is continuous, then 
V is continuous. 

Proof: We state the proof for Model B. The proof for Model 
A is similar. By the definition of sup-norm, we have that for any 
bounded v 

rk 

WPB^’^'^vW^ = P sup / (f){w — ae)v{w)dw 

eG( — k,k) J — k 

rk 

<P sup ||u||oo / cj>{'w — ae)dw 

eG{ — k,k) J — k 

< /3||u||oo, (since 0 is a pdf). 

Hence, PB^'^'^ is a contraction. 

Now, consider the operator B' given as: B'v = h-\- PB^^\. Then 
we have, 

\\B\vi - W2)||oo = P\\B^'^\v^ - U2)||oo < P\\V1 - U2||cx>. 

Since P G (0,1) and the space of bounded real-valued functions 
is complete, by Banach fixed point theorem, B' has a unique fixed 
point. 

If h is continuous, we can define B^'^^ and B' as operators on 
the space of continuous and bounded real-valued function (which is 
complete). Hence, the continuity of the fixed point follows also from 
Banach fixed point theorem. ■ 
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Proof of Lemma [7] 

The solutions of equations (H) and l |12[ ( exist due to Lemma 

(a) Consider k, I € X>o such that k < 1. A sample path starting 
from e G must escape before it escapes S’''). Thus 

In addition, the above inequality is strict 
because Wt has a unimodal distribution. Similar argument 
holds for . 

(b) The continuity and differentiability can be proved from ele¬ 
mentary algebra. See the supplementary material for details. 

(c) The limit holds since L^p\e) and M^p\e) are continuous 
functions of p. 


Appendix B 

Proof of Proposition[T] 

N^p{Q)). By Lemma and Theorem Np^\o) — N^‘\o) 
is positive, hence C^\0; X) — C'^*^^(0;A) is decreasing in A. 
Hence cj^\0; X) is submodular. 

2) Note that kp{X) = arginffc>o A) can take a value oo 

(which corresponds to the strategy ‘never communicate’). Thus, 
the domain of k is X>o U {cio}, which is compact. Hence, 
by j48[ Theorem 2.8.2], fcj is increasing in A. 


Appendix C 

Proofs of Propositions[4]and[5] 

We prove the results for Model A when the horizon T is finite. 
The results then follow by taking limits as T —> oo. The proofs for 
Model B are almost identical. 

The value function for the finite horizon setup for /3 G (0, l]is 
given by Vp^r+i = 0 and for f = T, • • ■ ,1 

OO 

A) = min |(1 -/3)A-b/3 ^ p„l/a,t+i(n; A), 

n = —OO 
oo 

{1-P)d{e) + P ^ p„_ae'K8,t+i(n; A)|. (44) 

n = —OO 

The value functions and V^~'^ are defined similarly. 

For ease of notation, we drop /3 and A in the rest of the discussion 
in this Appendix. 

Lemma 10 The value functions Vt{-), Vj*''*'*(•) and are even. 

Proof: For all a G X, the per-step costs d{e) and A are even 
and the transition probabilities Pen{0) = Pn-ae and Pen (1) = Pn 
satisfy Pen{u) = P(-e)(-n)(u) for u G {0,1}. Therefore, Vt(e) is 
even \A9\ Theorem 1]. A similar argument holds for and 

m 

Lemma 11 For the finite horizon setup, = Vi~\e). 


Proof: We prove the result by backward induction. The result is 
trivially true for T + 1 as (e) = (e) = 0, which forms the 

basis of the induction. Assume (e) = (e) for all e G X. 

Define 

OO OO 

= Ep—(^)- 


Then 


E P—E P-n-aeH4+\-n) 

n = —OO —n= —oo 

OO oo 

Y,P^ + aeV,^+l{n) EP"+-^4'i W = 


n= —OO 


where (a) uses p and are even and (b) uses the induction 

hypothesis. Substituting this back in the definition of and 

Vt~\e), we get that V’/^^(e) = Vt~\e). Therefore, the result is 
true by induction. ■ 

Lemma 12 For m, e G X>o, define 

Q(m|e, 0) = E pn — ae and Q(m\e,l)= E P"- 

n.:|n[>m n:|n|>m 

Then, for all e, m G X>o and a > 0, Q{m\e, 0) and Q(m\e, 1) are 
increasing in e. 

We will prove this Lemma later. 


Definition 2 A function /: X —> E, w called even and increasing 
on X>o if for all x G X>o, f{x) = f{—x) and f{x) < f{x + 1). 

Lemma 13 The value function Vt{e) is even and increasing on X>o. 


Proof: We have already shown that Vt(e) is even. For a > 0, 
the properties described in the proof of Lemma [T^ and the statement 
Lemma 12 imply that Vt(e) is even and increasing |49| Theorem 1]. 
Now, Lemma 0 implies that Vt(e) is also even and increasing for 
a < 0 . ■ 

Proofs of Propositions and The result follows from 
Lemmas [m and [T^ by taking the limit T —>■ oo, since equality is 
preserved under limits. ■ 

Q(m|e, 1) is independent of e. Define 
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Proof of Lemma 

R{rn\e) = i]„,|„|<„,Pn-e. Then, Q{m\e,0) = 1 - R{m\ae). To 
show Q(m|e, 0) is increasing in e, it suffices to show that R{m\ae) > 
R{m\ae + 1) (which implies that R(m\ae) > R{m\ae -I- a)). 

Now consider 


R(^7Tl\ae^ R(^7Tl\ae-\~l'} —Pm — ae P — m — ae — 1 — Pra — ae Pra+ae+l- 


If m > ae, then 0 < m — ae < m + ae + 1, hence, Pm-ae > 
Pm+ae+i- If m < 06 , then 0 < ae — m < m + ae -I- 1, hence 
Pm-ae = Pae-m > Pm+ae+ 1 . Thus, in both cuses, R{m\ae) > 
R{m\ae + 1). ■ 


Appendix D 

Proof of Part 3) of Lemma[^ 

By Lemma 1 is strictly increasing in k\ therefore, by 

Theorem 2 Np\e) is strictly decreasing in k. 

I—I Z 

We prove the monotonicity of ' in k for Model A for /3 G 
(0,1). The result for /3 = 1 follows by taking limit f 1. The result 
for Model B is similar. Based on Lemma we restrict attention to 
a > 0 . 

For any G (0,1) and k G Z>o, define the operator : (Z —z 
E,) —>■ (Z —>■ E) as follows. For any D : Z —z E, 


[r'''^D](e) 


P[BD]{0), if|e|>fc 

{1 - P)d{e) + P[BD]{e) if |e| < fc. 


This operator is the Bellman operator for evaluating strategy f^'^K 
Hence, it is a contraction and is the unique fixed point of 
Define , and for m G Z>o, = 


From Lemma [7^ and (49 [ Lemma 2], we get that for any e G Z>o, 


E Pa-aeDf\n)> E PnDf\n), 

n = —OO n = —oo 

or equivalently, [BD^p]{e) > [BDP]{Q). 

For |e| = k, D^p^\e) = (1 — P)d{e) + P{BD''p]{e) and 
(e) = P[BD^P]{0); hence, > D^p{e). For |e| 7 ^ k. 


n = — oo 
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D^p’^\e) = D^p\e) because both terms have the same expression. 
Hence, for all e G Z, 

D^p'^\e)>Df\e), or Dp^'’>Df\ 

If we apply the operator to both sides, the monotonicity of 

7 "(fc+i) implies that > D^p \ Proceeding this way, 

we get that for any m > 0, 

(46) 

Note that limm-»oo because is the unique 

fixed point of the operator Thus, taking limit m —>■ oo 

in j46l l, we get that 
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Appendix E 

Proof of the structural results 

The results of (T) relied on the notion of ASU (almost symmetric 
and unimodal) distributions introduced in Q. 

Definition 3 (Almost symmetric and unimodal distribution) A 

probability distribution p, on Tj is almost symmetric and unimodal 
(ASU) about a point a £ Z if for every n £ Z>o, 

pa-\-n ^ Pa — n ^ /Ta+n + l- 

A probability distribution that is ASU around 0 and even (i.e., 
Pn = p-n) is called ASU and even. Note that the definition of ASU 
and even is equivalent to even and decreasing on Z>o. 

Definition 4 (ASU Rearrangement) The ASU rearrangement of a 
probability distribution p, denoted by p'^, is a permutation of p such 
that for every n £ Z>o, 

(tn — (t — n — Mn+1- 


For any n £ Z>o, let denote the rectangular function from —n 
to n, i.e., 

1 , if |e| < n, 

0 , otherwise. 

Note that any ASU and even distribution p may be written as a 
sum of rectangular functions as follows: 

OO 

P = - Pn+l)r^"'K 

n = 0 

It should be noted that pn — pn+i > 0 because p is ASU and even. 
v may also be written in a similar form. 

The convolution of any two rectangular functions and is 
ASU and even. Therefore, by the distributive property of convolution, 
the convolution of p and o is also ASU and even. 

The proof for the general a £ Z follows from the following facts: 

1) Shifting a distribution is equivalent to convolution with a shifted 
delta function. 

2) Convolution is commutative and associative. 


We now introduce the notion of majorization for distributions 
supported over Z, as defined in 

Definition 5 (Majorization) Let p and v be two probability distri¬ 
butions defined over Z. Then p is said to majorize v, which is denoted 
by p (zm V, if for all n £ Z>o, 

71 71 

^ Tt> ^ 4 , 

i= — 7i i— — 7i 

71+1 n+1 

^ ut> ^ 

i= — 7i i— — 7i 

The structure of optimal estimator in Theorem 8 were proved in 
two steps in Q. The first step relied on the following two results. 

Lemma 14 Let p and v be probability distributions with finite 
support defined over Tj. If p is ASU and even and v is ASU about 
a, then the convolution p* v is ASU about a. 

Lemma 15 Let p, v, and ^ be probability distributions with finite 
support defined over Wi. If p is ASU and even, v is ASU, and ^ is 
arbitrary, then v ^rn C implies that p* u '>Zm A* * C- 

These results were originally proved in Q and were stated as 
Lemmas 5 and 6 in jTJ. 

The second step (in the proof of structure of optimal estimator in 
Theorem 8) in QJ relied on the following result. 

Lemma 16 Let p be a probability distribution with finite support 
defined over Z and / : Z —>■ ]Il>o. Then, 

OO OO 

^f{n)pn <'^+{n)pt- 

7l = — 00 7l = — 00 

We generalize the results of Lemmas [T4l|15| and|16|to distributions 
over Z with possibly countable support. With these generalizations, 
we can follow the same two-step approach of JT] to prove the 
structure of optimal estimator as given in Theorem 8. 

The structure of optimal transmitter in Theorem 8 in only relied 
on the structure of optimal estimator. The exact same proof works in 
our model as well. 

A. Generalization of Lemma \I4\ to distributions supported over Z 

The proof argument is similar to that presented in Lemma 6.2]. 
We first prove the results for a = 0. Assume that u is ASU and even. 


B. Generalization of Lemma | j5| to distributions supported over Z 
We follow the proof idea of Theorem II. 1]. For any probability 

distribution p, we can find distinct indices ij, \j\ < n such that piij), 
I j| < n, are the 2n -|- 1 largest values of p. Define 

Puiij) = piij), 

for |j| 'ti and 0 otherwise. Clearly, pn f p and if p is ASU and 
even, so is pn- 

Now consider the distributions p, v, and ^ from Lemma but 
without the restriction that they have finite support. For every n £ 
Z>o, define pn, Vn, and as above. Note that all distributions have 
finite support and pn is ASU and even and is ASU. Furthermore, 
since the definition of majorization remain unaffected by truncation 
described above, ^n- Therefore, by Lemma [T5| 

(tn * On Pn * . 

By taking limit over n and using the monotone convergence theorem, 
we get 

p* U >Zm ft * C- 

C. Generalization of Lemma \I6\ to distributions supported over Z 
This is an immediate consequence of Theorem II. 1]. 

Appendix F 

Proof of (b) of Lemma I 

Note that for any bounded v, is bounded and increasing 

in k. We show that is continuous and differentiable in k. 

Similar argument holds for M^p\e). 

We show the differentiability in k. Continuity follows from the 
fact that differentiable functions are continuous. Note that L^p\e) 
and Mp^\e) are even functions of e. Now, for any e > 0 we have 


(fe+ 

'/3 

^\e)-Lf 

\e) 



= P 

r fiw - 

J —k 


\w)- 

L^p\w)\dw 


rk + e 


)Lf+^ 



+ 2/3 / 

Jk 

4>iw — ae 

\w)dw 


r - 

J-k 

ae)[Lf+^- 

\w) - 

L^p\w)]dw 


-U 2P4>ik — 


{k -\- e 

)e + Oie^) 
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Let R^i^\e,w; a) be the resolvent of 0, as given in (16). Then, 

— L^p\e) = 2/3 / R^p\e,w,a)(j>{k — ae)L^p^^\w)edw 
J -k 

+ 0{s^) 

This implies that 

— - - - < 2||(?!>||oo||La*'^||oo [ PRf\e,w;a)dw 

lei I J_j, I 

+ 0{e). 

Since is a contraction, the value of the integral in the first term 

on the right hand side of the above inequality is less than 1 and the 
result follows from the definition of differtiability. 
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